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ABSTRACT 

This paper investigates the MaxRS problem in spatial 
databases. Given a set O of weighted points and a rectangu- 
lar region r of a given size, the goal of the MaxRS problem is 
to find a location of r such that the sum of the weights of all 
the points covered by r is maximized. This problem is use- 
ful in many location-based applications such as finding the 
best place for a new franchise store with a limited delivery 
range and finding the most attractive place for a tourist with 
a limited reachable range. However, the problem has been 
studied mainly in theory, particularly, in computational ge- 
ometry. The existing algorithms from the computational 
geometry community are in-memory algorithms which do 
not guarantee the scalability. In this paper, we propose a 
scalable external-memory algorithm (ExactMaxRS) for the 
MaxRS problem, which is optimal in terms of the I/O com- 
plexity. Furthermore, we propose an approximation algo- 
rithm (ApproxMaxCRS) for the MaxCRS problem that is 
a circle version of the MaxRS problem. We prove the cor- 
rectness and optimality of the ExactMaxRS algorithm along 
with the approximation bound of the ApproxMaxCRS algo- 
rithm. From extensive experimental results, we show that 
the ExactMaxRS algorithm is two orders of magnitude faster 
than methods adapted from existing algorithms, and the 
approximation bound in practice is much better than the 
theoretical bound of the ApproxMaxCRS algorithm. 

1. INTRODUCTION 

In the era of mobile devices, location-based services are 
being used in a variety of contexts such as emergency, nav- 
igation, and tour planning. Essentially, these applications 
require managing and processing a large amount of location 
information, and technologies studied in spatial databases 
are getting a great deal of attention for this purpose. Tradi- 
tional researches in spatial databases, however, have mostly 
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focused on retrieving objects (e.g., range search, nearest 
neighbor search, etc.), rather than finding the best location 
to optimize a certain objective. 

Recently, several location selection problems [9, 16, 18, 19, 
20, 21, 22, 23] have been proposed. One type of these prob- 
lems is to find a location for a new facility by applying the 
well-known facility location problem in theory to database 
problems such as optimal-location queries and bichromatic 
reverse nearest neighbor queries. Another type of location 
selection problems is to choose one of the predefined can- 
didate locations based on a given ranking function such as 
spatial preference queries. 

In this paper, we solve the maximizing range sum (MaxRS) 
problem in spatial databases. Given a set O of weighted 
points (a.k.a. objects) and a rectangle r of a given size, the 
goal of the MaxRS problem is to find a location of r which 
maximizes the sum of the weights of all the objects covered 
by r. Figure 1 shows an instance of the MaxRS problem 
where the size of r is specified as d\ x di. In this exam- 
ple, if we assume that the weights of all the objects are 
equally set to 1, the center point of the rectangle in solid 
line is the solution, since it covers the largest number of ob- 
jects which is 8. The figure also shows some other positions 
for r, but it should be noted that there are infinitely many 
such positions - r can be anywhere in the data space. The 
MaxRS problem is different from existing location selection 
problems mentioned earlier in that there are no predefined 
candidate locations or other facilities to compete with. Fur- 
thermore, this problem is also different from range aggregate 
queries [17] in the sense that we do not have a known query 
rectangle, but rather, must discover the best rectangle in 
the data space. 
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Figure 1: An instance of the MaxRS problem 

In practice, there can be many kinds of facilities that 
should be associated with a region of a certain size. For 
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example, if we open, in an area with a grid shaped road net- 
work, a new pizza franchise store that has a limited delivery 
range, it is important to maximize the number of residents in 
a rectangular area around the pizza store. This is the case 
of finding a more profitable place to set up a new service 
facility. 

For an opposite case, the MaxRS problem can be applied 
to find a more serviceable place for client users. Consider 
a tourist who wants to find the most representative spot in 
a city. In this case, the tourist will prefer to visit as many 
attractions as possible around the spot, and at the same 
time s/he usually does not want to go too far away from the 
spot. 

There has been little research for this natural problem in 
the database community. In fact, this problem has been 
mainly studied in the computational geometry community. 
The first optimal in-memory algorithm for finding the posi- 
tion of a fixed-size rectangle enclosing the maximum number 
of points was proposed in [11]. Later, a solution to the prob- 
lem of finding the position of a fixed-size circle enclosing the 
maximum number of points was provided in [4]. 

Unfortunately, these in-memory algorithms are not scal- 
able for processing a large number of geographic objects 
in real applications such as residential buildings and mo- 
bile customers, since they are developed based on the as- 
sumption that the entire dataset can be loaded in the main 
memory. A straightforward adaptation of these in-memory 
algorithms into the external memory can be considerably 
inefficient due to the occurrence of excessive I/O's. 

In this paper, we propose the first external-memory algo- 
rithm, called ExactMaxRS, for the maximizing range sum 
(MaxRS) problem. The basic processing scheme of Exact- 
MaxRS follows the distribution- sweep paradigm [10], which 
was introduced as an external version of the plane-sweep al- 
gorithm. Basically, we divide the entire dataset into smaller 
sets, and recursively process the smaller datasets until the 
size of a dataset gets small enough to fit in memory. By 
doing this, the ExactMaxRS algorithm gives an exact solu- 
tion to the MaxRS problem. We derive the upper bound 
of the I/O complexity of the algorithm. Indeed, this upper 
bound is proved to be the lower bound under the comparison 
model in external memory, which implies that our algorithm 
is optimal. 

Furthermore, we propose an approximation algorithm, 
called ApproxMaxCRS, for the maximizing circular range 
sum (MaxCRS) problem. This problem is the circle version 
of the MaxRS problem, and is more useful than the rectan- 
gle version, when a boundary with the same distance from a 
location is required. In order to solve the MaxCRS problem, 
we apply the ExactMaxRS algorithm to the set of Minimum 
Bounding Rectangles (MBR) of the data circles. After ob- 
taining a solution from the ExactMaxRS algorithm, we find 
an approximate solution for the MaxCRS problem by choos- 
ing one of the candidate points, which are generated from 
the point returned from the ExactMaxRS algorithm. We 
prove that ApproxMaxCRS gives a (l/4)-approximate solu- 
tion in the worst case, and also show by experiments that 
the approximation ratio is much better in practice. 

Contributions. We summarize our main contributions as 
follows: 

• We propose the ExactMaxRS algorithm, the first 
external-memory algorithm for the MaxRS problem. 



We also prove both the correctness and optimality of 
the algorithm. 

• We propose the ApproxMaxCRS algorithm, an ap- 
proximation algorithm for the MaxCRS problem. We 
also prove the correctness as well as tightness of the 
approximation bound with regard to this algorithm. 

• We experimentally evaluate our algorithms using both 
real and synthetic datasets. From the experimental re- 
sults, we show that the ExactMaxRS algorithm is two 
orders of magnitude faster than methods adapted from 
existing algorithms, and the approximation bound of 
the ApproxMaxCRS algorithm in practice is much bet- 
ter than its theoretical bound. 

Organization. In Section 2, we formally define the prob- 
lems studied in this paper, and explain our computation 
model. In Section 3, related work is discussed. In Section 4, 
we review the in-memory algorithms proposed in the com- 
putational geometry community. In Sections 5 and 6, the 
ExactMaxRS algorithm and ApproxMaxCRS algorithm are 
derived, respectively. In Section 7, we show experimental 
results. Conclusions are made and future work is discussed 
in Section 8. 

2. PROBLEM FORMULATION 

Let us consider a set of spatial objects, denoted by O. 
Each object o £ O is located at a point in the 2-dimensional 
space, and has a non-negative weight w(o). We also use P 
to denote the infinite set of points in the entire data space. 

Let r(p) be a rectangular region of a given size centered 
at a point p € P, and O r ( p ) be the set of objects covered by 
r(p). Then the maximizing range sum (MaxRS) problem is 
formally defined as follows: 

Definition 1 (MaxRS Problem). Given P, O, and 
a rectangle of a given size, find a location p that maximizes: 

oeo r(p) 

Similarly, let c(p) be a circular region centered at p with 
a given diameter, and O c ( P ) be the set of objects covered 
by c(p). Then we define the maximizing circular range sum 
(MaxCRS) problem as follows: 

Definition 2 (MaxCRS Problem). Given P, O, and 
a circle of a given diameter, find a location p that maximizes: 

For simplicity, we discuss only the SUM function in this 
paper, even though our algorithms can be applied to other 
aggregates such as COUNT, SUM, and AVERAGE. With- 
out loss of generality, objects on the boundary of the rect- 
angle or the circle are excluded. 

Since we focus on a massive number of objects that do 
not fit in the main memory, the whole dataset O is assumed 
to be stored in external memory such as a disk. Therefore, 
we follow the standard external memory (EM) model [10] to 
develop and analyze our algorithms. According to the EM 
model, we use the following parameters: 
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N : the number of objects in the database (i.e., \0\) 

M : the number of objects that can fit in the main memory 

B : the number of objects per block 

We comply with the assumption that N is much larger 
than M and B, and the main memory has at least two blocks 
(i.e., M > 25). 

In the EM model, the time of an algorithm is measured 
by the number of I/O's rather than the number of basic 
operations as in the random access memory (RAM) model. 
Thus, when we say linear time in the EM model, it means 
that the number of blocks transferred between the disk and 
memory is bounded by 0(N/B) instead of O(N). Our goal 
is to minimize the total number of I/O's in our algorithms. 

3. RELATED WORK 

We first review the range aggregate processing methods 
in spatial databases. The range aggregate (RA) query was 
proposed for the scenario where users are interested in sum- 
marized information about objects in a given range rather 
than individual objects. Thus, a RA query returns an ag- 
gregation value over objects qualified for a given range. In 
order to efficiently process RA queries, usually aggregate in- 
dexes [5, 12, 13, 15, 17] are deployed as the underlying access 
method. To calculate the aggregate value of a query region, 
a common idea is to store a pre-calculated value for each 
entry in the index, which usually indicates the aggregation 
of the region specified by the entry. However, the MaxRS 
problem cannot be efficiently solved using aggregate indexes, 
because the key is to find out where the best rectangle is. A 
naive solution to the MaxRS problem is to issue an infinite 
number of RA queries, which is prohibitively expensive. 

Recently, researches about the selection of optimal loca- 
tions in spatial databases have been reported, and they are 
the previous work most related to ours. Du et al. proposed 
the optimal-location query [9], which returns a location in a 
query region to maximize the influence that is defined to be 
the total weight of the reverse nearest neighbors. They also 
defined a different query semantics in their extension [22], 
called min-dist optimal-location query. In both works, their 
problems are stated under L\ distance. Similarly, the max- 
imizing bichromatic nearest neighbor (MaxBRNN) problem 
was studied by Wong et al. [18] and Zhou et al. [23]. This is 
similar to the problem in [9] except that L 2 distance, instead 
of L\ distance, is considered, making the problem more dif- 
ficult. Moreover, Xiao et al. [20] applied optimal-location 
queries to road network environments. 

However, all these works share the spirit of the classic fa- 
cility location problem, where there are two kinds of objects 
such as customers and service sites. The goal of these works 
is essentially to find a location that is far from the competi- 
tors and yet close to customers. This is different from the 
MaxRS (MaxCRS) problem, since we aim at finding a loca- 
tion with the maximum number of objects around, without 
considering any competitors. We have seen the usefulness 
of this configuration in Section 1. 

There is another type of location selection problems, where 
the goal is to find top-k spatial sites based on a given ranking 
function such as the weight of the nearest neighbor. Xia et 
al. proposed the top-t most influential site query [19]. Later, 
the top-k spatial preference query was proposed in [16, 21], 
which deals with a set of classified feature objects such as 
hotels, restaurants, and markets by extending the previous 



work. Even though some of these works consider the range 
sum function as a ranking function, their goal is to choose 
one of the candidate locations that are predefined. How- 
ever, there are an infinite number of candidate locations in 
the MaxRS (MaxCRS) problem, which implies that these al- 
gorithms are not applicable to the problem we are focusing 
on. 

In the theoretical perspective, MaxRS and MaxCRS have 
been studied in the past. Specifically, in the computational 
geometry community, there were active researches for the 
max-enclosing polygon problem. The purpose is to find a 
position of a given polygon to enclose the maximum number 
of points. This is almost the same as the MaxRS problem, 
when a polygon is a rectangle. For the max-enclosing rect- 
angle problem, Imai et al. proposed an optimal in-memory 
algorithm [11] whose time complexity is O(nlogn), where n 
is the number of rectangles. Actually, they solved a prob- 
lem of finding the maximum clique in the rectangle intersec- 
tion graph based on the well-known plane-sweep algorithm, 
which can be also used to solve the max-enclosing rectangle 
problem by means of a simple transformation [14]. Inher- 
ently, however, these in-memory algorithms do not consider 
a scalable environment that we are focusing on. 

In company with the above works, there were also works 
to solve the max-enclosing circle problem, which is similar 
to the MaxCRS problem. Chazelle et al. [4] were the first 
to propose an 0(n 2 ) algorithm for this problem by finding 
a maximum clique in a circle intersection graph. The max- 
enclosing circle problem is actually known to be 3s\JM-hard 
[3], namely, it is widely conjectured that no algorithm can 
terminate in less than Q(n 2 ) time in the worst case. There- 
fore, several approximation approaches were proposed to re- 
duce the time complexity. Recently, Berg et al. proposed 
a (1 — e)-approximation algorithm [7] with time complex- 
ity 0(nlogn + ne~ 3 ). They divide the entire dataset into 
a grid, and then compute the local optimal solution for a 
grid cell. After that the local solutions of cells are combined 
using a dynamic-programming scheme. However, it is gen- 
erally known that a standard implementation of dynamic 
programming leads to poor I/O performance [6], which is 
the reason why it is difficult for this algorithm to be scal- 
able. 

4. PRELIMINARIES 

In this section, we explain more details about the solutions 
proposed in the computational geometry community. Our 
solution also shares some of the ideas behind those works. 
In addition, we show that the existing solutions cannot be 
easily adapted to our environment, where a massive size of 
data is considered. 

First, let us review the idea of transforming the max- 
enclosing rectangle problem into the rectangle intersection 
problem in [14]. The max-enclosing rectangle problem is the 
same as the MaxRS problem except that it considers only 
the count of the objects covered by a rectangle ( equivalent ly, 
each object has weight 1). The rectangle intersection prob- 
lem is defined as "Given a set of rectangles, find an area 
where the most rectangles intersect". Even though these 
two problems appear to be different at first glance, it has 
been proved that the max-enclosing rectangle problem can 
be mapped to the rectangle intersection problem [14]. 

We explain this by introducing a mapping example shown 
in Figure 2. Suppose that the dataset has four objects 
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(black-filled) as shown in Figure 2(a). Given a rectangle 
of size di x d,2, an optimal point can be the center point 
p of rectangle r (see Figure 2(a)). To transform the prob- 
lem, we draw a rectangle of the same size centered at the 
location of each object as shown in Figure 2(b). It is not 
difficult to observe that the optimal point p in the max- 
enclosing rectangle problem can be any point in the most 
overlapped area (gray-filled) which is the outcome of the 
rectangle intersection problem. Thus, once we have found 
the most overlapped area in the transformed rectangle inter- 
section problem, the optimal location of the max-enclosing 
rectangle problem can trivially be obtained. 











t 




• 



problem, for each object o G O, we construct a correspond- 
ing rectangle r which is centered at the location of o and 
has a weight w(o). All these rectangles have the same size, 
which is as specified in the original problem. We use R to 
denote the set of these rectangles. Also, we define two no- 
tions which are needed to define our transformed MaxRS 
problem later: 

Definition 3 (Location- weight). Let p be a location 
in P , the infinite set of points in the entire data space. Its 
location-weight with regard to R equals the sum of the weights 
of all the rectangles in R that cover p. 

Definition 4 (Max- region). The max-region p with 
regard to R is a rectangle such that: 

• every point in p has the same location-weight r, and 

• no point in the data space has a location-weight higher 
than t. 



(a) Max-enclosing rect- (b) Rectangle intersec- 
angle problem tion problem 

Figure 2: An example of transformation 

For the rectangle intersection problem, an in-memory al- 
gorithm was proposed in [11], which is based on the well- 
known plane-sweep algorithm. Basically, the algorithm re- 
gards the edges of rectangles as intervals and maintains a 
binary tree while sweeping a conceptual horizontal line from 
bottom to top. When the line meets the bottom (top) edge 
of a rectangle, a corresponding interval is inserted to (deleted 
from) the binary tree, along with updating the counts of in- 
tervals currently residing in the tree, where the count of 
an interval indicates the number of intersecting rectangles 
within the interval. An interval with the maximum count 
during the whole sweeping process is returned as the final 
result. The time complexity of this algorithm is 0(n log n), 
where n is the number of rectangles, since n insertions and 
n deletions are performed during the sweep, and the cost of 
each tree operation is O(logn). This is the best efficiency 
possible in terms of the number of comparisons [11]. 

Unfortunately, this algorithm cannot be directly applied 
to our environment that is focused on massive datasets, since 
the plane-sweep algorithm is an in-memory algorithm based 
on the RAM model. Furthermore, a straightforward adap- 
tation of using the B-tree instead of the binary tree still 
requires a large amount of I/O's, in fact 0(N log s N). Note 
that the factor of N is very expensive in the sense that linear 
cost is only 0(N/B) in the EM model. 

5. EXACT ALGORITHM FOR MAXIMIZ- 
ING RANGE SUM 

In this section, we propose an external-memory algorithm, 
namely ExactMaxRS, that exactly solves the MaxRS prob- 
lem in 0((N/B)log M/B (N/Bj) I/O's. This is known [2, 
11] to be the lower bound under the comparison model in 
external memory. 

5.1 Overview 

Essentially, our solution is based upon the transformation 
explained in Section 4. Specifically, to transform the MaxRS 



Intuitively, the max-region p with regard to R is an inter- 
secting region with the maximum sum of the weights of the 
overlapping rectangles. Then our transformed MaxRS prob- 
lem can be defined as follows: 

Definition 5 (Transformed MaxRS Problem). 
Given R, find a max-region p with regard to R. 

Apparently, once the above problem is solved, we can re- 
turn an arbitrary point in p as the answer for the original 
MaxRS problem. 

At a high level, the ExactMaxRS algorithm follows the 
divide-and-conquer strategy, where the entire datset is re- 
cursively divided into mutually disjoint subsets, and then 
the solutions that are locally obtained in the subsets are 
combined. The overall process of the ExactMaxRS algo- 
rithm is as follows: 

1. Recursively divide the whole space vertically into m 
sub-spaces, called slabs and denoted as 71, , ,7 m , each 
of which contains roughly the same number of rectan- 
gles, until the rectangles belonging to each slab can fit 
in the main memory. 

2. Compute a solution structure for each slab, called slab- 
file, which represents the local solution to the sub- 
problem with regard to the slab. 

3. Merge m slab-files to compute the slab-file for the 
union of the m slabs until the only one slab-file re- 
mains. 

In this process, we need to consider the following: (1) How 
to divide the space to guarantee the termination of recursion; 
(2) how to organize slab-files, and what should be included 
in a slab-file; (3) how to merge the slab-files without loss of 
any necessary information for finding the final solution. 

5.2 ExactMaxRS 

Next we address each of the above considerations, and 
explain in detail our ExactMaxRS algorithm. 

5.2.1 Division Phase 

Let us start with describing our method for dividing the 
entire space. Basically, we recursively divide the space ver- 
tically into m slabs along the x-dimension until the number 
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of rectangles in a slab can fit in the main memory. Since a 
rectangle in R can be large, it is unavoidable that a rectangle 
may need to be split into a set of smaller disjoint rectangles 
as the recursion progresses, which is shown in Figure 3. As 
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Figure 3: An example of splitting a rectangle 

a naive approach, we could just insert all the split rectangles 
into the corresponding slabs at the next level of recursion. 
In Figure 3, the three parts of rectangle r will be inserted 
into slabs 71, 72, and 73, respectively. 

However, it is not hard to see that this approach does 
not guarantee the termination of recursion, since rectangles 
may span an entire slab, e.g., the middle part of r spans 
slab 72. In the extreme case, suppose that all rectangles 
span a slab 7. Thus, no matter how many times we divide 
7 into sub-slabs, the number of rectangles in each sub-slab 
still remains the same, meaning that recursion will never 
terminate infinitely. 

Therefore, in order to gradually reduce the number of rect- 
angles for each sub-problem, we do not pass spanning rect- 
angles to the next level of recursion, e.g., the middle part of 
r will not be inserted in the input of the sub-problem with 
regard to 72. Instead, the spanning rectangles are consid- 
ered as another local solution for a separate, special, sub- 
problem. Thus, in the merging phase, the spanning rectan- 
gles are also merged along with the other slab-files. In this 
way, it is guaranteed that recursion will terminate eventually 
as proved in the following lemma: 

Lemma 1. After 0(\og m (N /M)) recursion steps, the num- 
ber of rectangles in each slab will fit in the main memory. 

Proof. Since the spanning rectangles do not flow down 
to the next recursion step, we can just partition the vertical 
edges of rectangles. There are initially 2N vertical edges. 
The number of edges in a sub-problem will be reduced by a 
factor of m by dividing the set of edges into m smaller sets 
each of which has roughly the same size. Each vertical edge 
in a slab represents a split rectangle. It is obvious that there 
exists an h such that 2N/m h < M. The smallest such h is 
thus 0(log m (A/M)). □ 

Determination of m. We set m = Q(M/B), where M/B 
is the number of blocks in the main memory. 

5.2.2 Slab-files 

The next important question is how to organize a slab-file. 
What the question truly asks about is what structure should 
be returned after conquering the sub-problem with regard 
to a slab. Each slab-file should have enough information to 
find the final solution after all the merging phases. 

To get the intuition behind our solution (to be clarified 
shortly), let us first consider an easy scenario where every 



rectangle has weight 1, and is small enough to be totally 
inside a slab, which is shown in Figure 4. Thus, no spanning 
rectangle exists. In this case, all we have to do is to just 
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Figure 4: An easy scenario to illustrate the intuition 
of slab-files 

maintain a max-region (black-filled in Figure 4) with regard 
to rectangles in each slab. Recall that a max-region is the 
most overlapped area with respect to the rectangles in the 
corresponding slab (see Definition 4). Then, in the merging 
phase, among m max-regions (i.e., one for each slab), we 
can choose the best one as the final solution. In Figure 4, 
for instance, the best one is p m because it is the intersection 
of 3 rectangles, whereas the number is 2 for the max regions 
of the other slabs. 

Extending the above idea, we further observe that the hor- 
izontal boundaries of a max-region are laid on the horizontal 
lines passing the bottom or top edge of a certain rectangle. 
Let us use the term h-line to refer to a horizontal line pass- 
ing a horizontal edge of an input rectangle. Therefore, for 
each h-line in a slab, it suffices to maintain a segment that 
could belong to the max-region of the slab. To formalize 
this intuition, we define max-interval as follows: 

Definition 6 (Max-interval). Let (1) l.y be the y- 
coordinate of a h-line £, and £\ and £2 be the consecutive 
h-lines such that l\.y < 12-V, (2) iD'y be the part of a h-line 
I in a slab 7, and (3) r 7 be the rectangle formed by l\.y, 12-V, 
and vertical boundaries of 7. A max-interval is a segment 
t on £1 n 7 such that, the x-range of t is the x-range of the 
rectangle r ma x bounded by £\.y, £2-1/ , and vertical lines at Xi 
and Xj , where each point in r max has the maximum location- 
weight among all points in r 7 . 



Figure 5 illustrates Definition 6. 




Figure 5: An illustration of Definition 6 

Our slab-file is a set of max-intervals defined only on h- 
lines. Specifically, each max-interval is represented as a tu- 
ple specified as follows: 

t—<y, [xi,X2], sum > 

where y is the y-coordinate of t (hence, also of the h-linc 
that defines it) , and [xi , X2] is the x-range of t, and sum is 
the location-weight of any point in t. In addition, all the 
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tuples in a slab-file should be sorted in ascending order of 
y-coordinates. 

Example 1. Figure 6 shows the slab-files that are gener- 
ated from the example in Figure 2, assuming that ra = 4 and 
Vo G O, w(o) — 1. Max-intcrvals are represented as solid 
segments. For instance, the slab-file of slab 71 consists of tu- 
ples (in this order): < 3/2, [2:1,2:2], 1 >, < 2/4, [2:1,2:2], 2 >, 

< j/6, [2:0,2-2], 1 >, < J/7, [—00,2:2], >. The first tuple 

< 1/2, [2:1,2:2], 1 > implies that, in slab 71, on any hor- 
izontal line with y-coordinate in (2/2,2/4), the max-interval 
is always [2:1,2:2], and its sum is 1. Similarly, the second 
tuple < 2/4, [2:1,2:2], 2 > indicates that, on any horizon- 
tal line with y-coordinate in (2/4,2/6), [2:1,2:2] is always the 
max-interval, and its sum is 2. Note that spanning rect- 
angles have not been counted yet in these slab-files, since 
(as mentioned earlier) they are not part of the input to the 
sub-problems with regard to slabs 71, ...,74. 
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Figure 6: An example of slab-files 



Lemma 2. Let K be the number of rectangles in a slab. 
Then the number of tuples in the corresponding slab-file is 
0{K). 

Proof. The number of h-lines is at most double the num- 
ber of rectangles. As a h-line defines only one max-interval 
in each slab, the number of tuples in a slab-file is at most 
2K , which is O(K). □ 

5.2.3 Merging Phase 

Now we tackle the last challenge: how to merge the slab- 
files, which is also the main part of our algorithm. 

The merging phase sweeps a horizontal line across the 
slab-files and the file containing spanning rectangles. At 
each h-line, we choose a max-interval with the greatest sum 
among the max-intervals with regard to the m slabs, re- 
spectively. Sometimes, max-intervals from adjacent slabs 
are combined into a longer max-interval. 

The details of merging, namely MergeSweep, are presented 
in Algorithm 1. The input includes a set of spanning rectan- 
gles and m slab-files. Also, each spanning rectangle contains 
only the spanning part cropped out of the original rectan- 
gle r G R, and has the same weight as r a (recall that the 
weight of r is set to w{o)). We use upSum[i] to denote 
the total weight of spanning rectangles that span slab 7i 



and currently intersect the sweeping line; upSum[i] is ini- 
tially set to (Line 2). Also, we set t a i ab [{\ to be the tu- 
ple representing the max-interval of ji in the sweeping line. 
Since we sweep the line from bottom to top, we initially set 
t s iab[i].y = —00. In addition, the initial interval and sum 
of £ s iab[i] are set to be the x-range of 7; and 0, respectively 
(Line 3). When the sweeping line encounters the bottom 
of a spanning rectangle that spans 7^, we add the weight of 
the rectangle to upSum[i] (Lines 6-8); conversely, when 
the sweeping line encounters the top of the spanning rectan- 
gle, we subtract the weight of the rectangle (Lines 9 - 11). 
When the sweeping line encounters several tuples (from dif- 
ferent slab-files) having the same y-coordinate (Line 12), we 
first update t s i a t[i]'s accordingly (Lines 13 - 16), and then 
identify the tuples with the maximum sum among all the 
t s iab[i]'s (Line 17). Since there can be multiple tuples with 
the same maximum sum at an h-line, we call a function 
GetMaxInterval to generate a single tuple from those tuples 
(Line 18). Specifically, given a set of tuples with the same 
sum value, GetMaxInterval simply performs: 

1. If the max-intervals of some of those tuples are con- 
secutive, merge them into one tuple with an extended 
max-interval. 

2. Return an arbitrary one of the remaining tuples after 
the above step. 

Lastly, we insert the tuple generated from GetMaxInterval 
into the slab- file to be returned (Line 20). This process will 
continue until the sweeping line reaches the end of all the 
slab files and the set of spanning rectangles. 

Algorithm 1 MergeSweep 

Input : ra slab-files Si , ... , S m for m slabs 71 , . . . , j m , a set of 
spanning rectangles R' 

m 

Output: a slab-file S for slab 7 = (J 7,. Initially S «— <f> 

i=l 

1: for i — to m do 

2: upSum[i] <— 

3: t s i ab [i] <— < —00, the range of x-coordinates of 7*, > 
4: end for 

5: while sweeping the horizontal line £ from bottom to top 
do 

6: if I meets the bottom of r G R' then 

7: upSum[j] -(— upSum[j] + w(o), Vj s.t. r a spans 7,- 

8: end if 

9: if I meets the top of r G R' then 
10: upSum[j] <— upSum[j] — w(o), Vj s.t. r spans 7,- 
11: end if 

12: if i meets a set of tuples T = {t \ t.y — £.y} then 
13: for all t G T do 

14: t s iab[i\ 4- t, s.t. t G Si 

15: t„iab[i]-sum <s— t.sum + upSum[i], s.t. t G Si 

16: end for 

17: T" 4— the set of tuples in t s i ab [l], ...,t s i ab [m] with 

the largest sum values 
18: t max 4- GetMaxInterval(T') 
19: end if 
20: S^SU{t max } 
21: end while 
22: return S 
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Example 2. Figure 7 shows how the MergeSweep algo- 
rithm works by using Example 1. For clarity, rectangles are 
removed, and the sum value of each max-interval is given 
above the segment representing the max-interval. Also, the 
value of upSum for each slab is given as a number enclosed 
in a bracket, e.g., upSum[2] = 1, between j/2 and y$. 

When the sweeping line t is located at yo, two max- intervals 
from 73 and 74 are merged into a larger max-interval. On the 
other hand, when i is located at yi , the max-interval from 74 
is chosen, since its sum value 2 is the maximum among the 2 
max-intervals at y\ . In addition, it is important to note that 
sum values of the max-intervals at 2/4 and j/5 are increased 
by the value of upSum[2] — 1. Figure 7(b) shows the result- 
ing max-intervals at the end of merging slab-files. We can 
find that the max-region of the entire data space is between 
max-intervals at 1/4 and 1/5, because the max-interval at 2/4 
has the highest sum value 3. 




7/ 72 yj y 4 

(a) Four slab-files before merge 



}'5 



y 7 - 
y 6 - 



y 4 - 

y/3, 

y - 



» 00) 



(b) A slab-file after merge 

Figure 7: An example to illustrate MergeSweep al- 
gorithm 

We can derive the following lemma: 

Lemma 3. Let K be the number of rectangles in slab 7 
in a certain recursion. Given m slab- files Si, S m of slabs 
71, ...,-y m , s.t., 7 = UYLi'Ji, and a set of spanning rectangles 
R' , MergeSweep algorithm returns the slab-file S of 7 in 
Q{K/B) I/O's. 



Proof. Since we set m — Q(M/B), a block of memory 
can be allocated as the input buffer for each slab-file as well 
as the file containing spanning rectangles. Also, we use an- 
other block of memory for the output buffer. By doing this, 
we can read a tuple of slab-files or a spanning rectangle, or 
write a tuple to the merged slab-file in 0(1/B) I/O's amor- 
tized. 

The number of I/O's performed by MergeSweep is propor- 
tional to the total number of tuples of all slab-files plus the 
number of spanning rectangles, i.e., 0((\R'\ + J2iL 1 
Let Ki be the number of rectangles in 7^ Then \Si\ = O(Ki) 
by Lemma 2. Also, Ki — Q(K/m), since the 2K ver- 
tical edges of the K rectangles are divided into m slabs 
evenly. Therefore, YTLi I&I = 0{K), which leads 0((\R'\ + 
Efci \Si\)/B) = 0(K/B), since \R'\ < K. □ 

5.2.4 Overall Algorithm 

The overall recursive algorithm ExactMaxRS is presented 
in Algorithm 2. We can obtain the final slab-file with regard 
to a set R of rectangles by calling ExactMaxRS (R, 7, m), 
where the x-range of 7 is (—00,00). Note that when the 
input set of rectangles can fit in the main memory, we invoke 
PlaneSweep(R) (Line 9), which is an in-memory algorithm 
that does not cause any I/O's. 

Algorithm 2 ExactMaxRS 

Input: a set of rectangles R, a slab 7, the number of sub- 
slabs m 
Output: a slab-file S for 7 

1: if \R\ > M then 

2: Partition 7 into 71,..., 7m, which have roughly the 
same number of rectangles. 

3: Divide 7? into Ri,...,R m , R' , where Ri is the set of 
non-spanning rectangles whose left (or right) vertical 
edges are in ji and R' is the set of spanning rectangles. 

4: for i = 1 to m do 

5: Si «- ExactMaxRS (Ri, 7i, m) 

6: end for 

7: S <- MergeSweep(Si,...,S m , R') 
8: else 

9: S <- PlaneSweep(i?) 
10: end if 
11: return S 

From returned S, we can find the max-region by compar- 
ing sum values of tuples trivially. After finding the max- 
region, an optimal point for the MaxRS problem can be any 
point in the max-region, as mentioned in Section 5.1. 

The correctness of Algorithm 2 is proved by the following 
lemma and theorem: 

Lemma 4. Let L* be a max-interval at a h-line with regard 
to the entire space and I*,...,L* be consecutive pieces of I* 
for a recursion, each of which belongs to slab 7; , where 1 < 
i < H- Then I* is also the max-interval at the h-line with 
regard to slab ji . 

Proof. Let sum(L) be the sum value of interval /. To 
prove the lemma by contradiction, suppose that there exists 
I* that is not a max-interval in 7^. Thus, there exists J' in 
7i such that sum(I') > sum(L*) on the same h-line. For 
any upper level of recursion, if no rectangle spans 7^, then 
sum(L') and sum(L*) themselves are already the sum values 
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with regard to the entire space. On the contrary, if there 
exist rectangles that span ji at some upper level of recursion, 
then the sum values of 7' and 7* with regard to the entire 
space will be sum(I') + W sp an and sum(I*) + W sp an, where 
Wspan is the total sum of the weights of all the rectangles 
spanning 7, in all the upper level of recursion. In both cases 
above, sum(I') > sum(I*) with regard to the entire space, 
which contradicts that 7* is the max-interval with regard to 
the entire space. □ 

Theorem 1. The slab-file returned from the ExactMaxRS 
algorithm is correct with regard to a given dataset. 

Proof. Let p* be the max-region with regard to a given 
dataset, and similarly I* be the best max-interval that is in 
fact the bottom edge of p* . Then we want to prove that the 
algorithm eventually returns a slab-file which contains 7* . 

Also, by Lemma 4, we can claim that for any level of 
recursion, a component interval 7* of 7* will also be the 
max-interval for its h-line within slab 7;. By Algorithm 1, 
for each h-line, the best one among the max-intervals at 
each h-line is selected (perhaps also extended). Therefore, 
eventually 7* will be selected as a max-interval with regard 
to the entire space. □ 

Moreover, we can prove the I/O-efficiency of the Exact- 
MaxRS algorithm as in the following theorem: 

Theorem 2. The ExactMaxRS algorithm solves the 
MaxRS problem in 0((N/B) log M/s (N/B)) I/O's, which 
is optimal in the EM model among all comparison-based al- 
gorithms. 

PROOF. The dataset needs to be sorted by x-coordinates 
before it is fed into Algorithm 2. The sorting can be done in 
0{(N/B) log M/s (N/B)) I/O's using the textbook-algorithm 
external sort. 

Given a dataset with cardinality N sorted by x-coordinates, 
the decomposition of the dataset along the x-dimension can 
be performed in linear time, i.e., 0(N/B). Also, by Lemma 
3, the total I/O cost of the merging process at each recursion 
level is also 0(N/B), since there can be at most 2N rectan- 
gles in the input of any recursion. By the proof of Lemma 1, 
there are 0(log M / s (N/B)) levels of recursion. Hence, the 
total I/O cost is 0((N/B)\og M/B (N/B)). 

The optimality of this I/O complexity follows directly 
from the results of [2] and [11]. □ 

6. APPROXIMATION ALGORITHM FOR 
MAXIMIZING CIRCULAR RANGE SUM 

In this section, we propose an approximation algorithm, 
namely ApproxMaxCRS, for solving the MaxCRS problem 
(Definition 2). Our algorithm finds an (l/4)-approximate 
solution in 0((N/B) log M/s (N/B)) I/O's. We achieve the 
purpose by a novel reduction that converts the MaxCRS 
problem to the MaxRS problem. 

6.1 ApproxMaxCRS 

Recall (from Definition 2) that the goal of the MaxCRS 
problem is to find a circle with a designated diameter that 
maximizes the total weight of the points covered. Denote by 
d the diameter. Following the idea explained in Section 4, 
first we transform the MaxCRS problem into the following 
problem: Let C be a set of circles each of which is centered 



at a distinct object o £ O, has a diameter as specified in 
the MaxCRS problem, and carries a weight w(o). We want 
to find a location p in the data space to maximize the total 
weight of the circles in C covering p. Figure 8(a) shows an 
instance of the transformed MaxCRS problem, where there 
are four circles in C, each of which is centered at an object 
oeOin the original MaxCRS problem. An optimal answer 
can be any point in the gray area. 

We will use the ExactMaxRS algorithm developed in the 
previous section as a tool to compute a good approximate 
answer for the MaxCRS problem. For this purpose, we con- 
vert each circle of C to its Minimum Bounding Rectangle 
(MBR). Obviously, the MBR is a d x d square. Let R be 
the set of resulting MBRs. Now, apply ExactMaxRS on R, 
which outputs the max-region with regard to R. Under- 
standably, the max-region (black area in Figure 8(b)) re- 
turned from the ExactMaxRS algorithm may contain loca- 
tions that are suboptimal for the original MaxCRS problem 
(in Figure 8(b), only points in the gray area are optimal). 
Moreover, in the worst case, the max-region may not even 
intersect with any circle at all as shown in Figure 8(c). 



\ P 




(a) The transformed (b) MBRs of (c) Worst case 
MaxCRS problem circles 

Figure 8: Converting MaxCRS to MaxRS 



Therefore, in order to guarantee the approximation bound, 
it is insufficient to just return a point in the max region. In- 
stead, our ApproxMaxCRS algorithm returns the best point 
among the center of the max-region and four shifted points. 
The algorithm is presented in Algorithm 3. 

Algorithm 3 ApproxMaxCRS 

Input: a set of circles C, a slab 7 whose range of the x- 

coordinate is (—00, 00), the number of slabs m 
Output: a point p 

1: Construct a set R of MBRs from C 

2: p <- ExactMaxRS(72, 7, m) 

3: po the center point of p 

4: for i — 1 to 4 do 

5: pi <s— GetShiftedPoint(po, i) 

6: end for 

7: p <— the point p among p ,...,p4 that maximizes the 

total weight of the circles covering p 
8: return p 

After obtaining the center point po of the max-region p 
returned from ExactMaxRS function (Lines 2 - 3), we find 
four shifted points p it where 1 < i < 4, from po as shown 
in Figure 9 (Lines 4-6). We use a to denote the shifting 
distance which determines how far a shifted point should be 
away from the center point. To guarantee the approximation 
bound as proved in Section 6.2, a can be set to any value 
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Symbol 


Description 


d 


the diameter of circles (a given parameter 




of the MaxCRS problem) 


Po 


the centroid of the max-region returned 




by ExactMaxRS 


Pr (i G [1,4]) 


a shifted point described in Algorithm 3 


c> (i e [0,4]) 


the circle with diameter d centering at 




point pi 


ro 


the MBR of c 


0(a) 


the set of objects covered by s, where s is 




a circle or an MBR 


W(s) 


the total weight of the objects in O(s) 



Table 1: List of notations 



such that (V2 — l)f < a < f . Finally, we return the best 
point p among po, ...,P4 (Lines 7-8). 




Figure 9: The illustration of shifting points 

Note that Algorithm 3 does not change the I/O complex- 
ity of the ExactMaxRS algorithm, since only linear I/O's 
are required in the entire process other than running the 
ExactMaxRS algorithm. Note that Line 7 of Algorithm 3 
requires only a single scan of C. 

6.2 Approximation Bound 

Now, we prove that the ApproxMaxCRS algorithm re- 
turns a (l/4)-approximate answer to the optimal solution, 
and also prove that this approximation ratio is tight with re- 
gard to this algorithm. To prove the approximation bound, 
we use the fact that a point p covered by the set of circles 
(or MBRs) in the transformed MaxCRS problem is truly 
the point such that the circle (or MBR) centered at p cov- 
ers the corresponding set of objects in the original MaxCRS 
problem. The main notations used in this section are sum- 
marized in Table 1. 

Lemma 5. For each i £ [0,4], let a be the circle centered 
at point pi, ro be the MBR of Co, and 0(s) be the set of 
objects covered by s, where s is a circle or an MBR. Then 
O(r ) C O(ci) U 0(c 2 ) U 0(c 3 ) U 0(c 4 ). 

Proof. As shown in Figure 10, all the objects covered 
by ro are also covered by ci, C2, C3, or C4, since (y/2 — l)f < 



Let W(s) be the total weight of the objects covered by s, 
where s is a circle or an MBR. Then, we have: 

Lemma 6. W(r ) < 4 max W(a). 

0<i<4 



Proof. 



W(r ) < J2 W ^ (by Lemma 5) 



< 4 max W(a) 

0<i<4 





(a)a = (V2-l)f (h)a- 
Figure 10: Lemma 5 



Theorem 3. The ApproxMaxCRS algorithm returns a 
(1/ '4) -approximate answer to the MaxCRS problem. 

Proof. Recall that p is the point returned from Algo- 
rithm 3 as the approximate answer to the MaxCRS prob- 
lem. Let point p* be an optimal answer for the MaxCRS 
problem. Denote by f and r* the MBRs centered at point 
p and p* , respectively. Likewise, denote by c and c* be the 
circles centered at point p and p* , respectively. The goal is 
to prove W(c*) < 4W(c). 

We achieve this purpose with the following derivation: 

W(c) < W(r*) < W(r ) < 4 max W(a) = 4W{c) 

0<i<4 

The first inequality is because r* is the MBR of c*. The 
second inequality is because po is the optimal solution for 
the MaxRS problem on R. The last equality is because 
ApproxMaxCRS returns the best point among po, ...,P4. □ 

Theorem 4. The 1 /4 approximation ratio is tight for the 
ApproxMaxCRS algorithm. 

Proof. We prove this by giving a worst case example. 
Consider an instance of the transformed MaxCRS problem 
in Figure 11 where each circle has weight 1. In this case, 
we may end up finding a max-region centered at po using 
the ExactMaxRS algorithm (notice that both po and p* are 
covered by 4 MBRs). In this case, we will choose one of 
Pi, ...,P4 as an approximate solution. Since each of pi, ...,P4 
is covered by only 1 circle, our answer is (l/4)-approximate, 
because the optimal answer p* is covered by 4 circles. □ 




□ 



Figure 11: Theorem 4 



7. EMPIRICAL STUDY 

In this section, we evaluate the performance of our algo- 
rithms with extensive experiments. 
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Dataset 


Cardinality 


UX 


19,499 


NE 


123,593 



Table 2: The cardinalities of real datasets 



Parameter 


Default value 


Cardinality (|0|) 


250,000 


Block size 


4KB 


Buffer size 


256KB (real dataset), 
1024KB (synthetic dataset) 


Space size 


1M x 1M 


Rectangle size (di x dz) 


IK x IK 


Circle diameter (d) 


IK 



Table 3: The default values of parameters 

7.1 Environment Setting 

We use both real and synthetic datasets in the experi- 
ments. We first generate synthetic datasets under uniform 
distribution and Gaussian distribution. We set the cardi- 
nalities of dataset (i.e., \0\) to be from 100,000 to 500,000 
(default 250,000). The range of each coordinate is set to be 
[0,4|O|] (default [0,1000000]). 

We also use two real datasets, North East (NE) dataset 
and United States of America and Mexico (UX) dataset, 
downloaded from the R-tree Portal [1]. The cardinalities 
of datasets are presented in Table 2. For both datasets, we 
normalize the range of coordinates to [0, 1000000]. 

Since no method is directly applicable to the MaxRS prob- 
lem in spatial databases, we should externalize the in-memory 
algorithm [11, 14] for max-rectangle enclosing problem to 
be compared with our ExactMaxRS algorithm. In fact, the 
externalization of this in-memory algorithm is already pro- 
posed by Du et al. [9], which is originally invented for pro- 
cessing their optimal-location queries. They present two al- 
gorithms based on plane-sweep, called Naive Plane Sweep 
and aSB- Tree, which are also applicable to the MaxRS prob- 
lem, even though their main algorithm based on a prepro- 
cessed structure, called the Vol-Tree, cannot be used in the 
MaxRS problem. 

As a performance metric, we use the number of I/O's, 
precisely the number of transferred blocks during the entire 
process. We do not consider CPU time, since it is dominated 
by I/O cost. 

We fix the block size to 4KB, and set the buffer size to 
256KB for real datasets and 1024KB for synthetic datasets 
by default. This is because the cardinalities of the real 
datasets are relatively small (recall that we consider a mas- 
sive dataset which cannot be fully loaded into the main 
memory). Also, for the MaxRS problem, we set the rect- 
angle size to 1000 x 1000 by default. Similarly, for the Max- 
CRS problem, we set the circle diameter to 1000 by default. 
All the default values of parameters are presented in Table 
3. 

We implement all the algorithms in Java, and conduct all 
the experiments on a PC equipped with Intel Core i7 CPU 
3.4GHz and 16GB memory. 

7.2 Experimental Results 

In this section, we present our experimental results. First, 
we examine the performance of alternative algorithms in 
terms of I/O cost by varying the parameters. Note that the 
I/O cost is in log scale in all the relevant graphs. We finally 



show the quality of approximation of our ApproxMaxCRS 
algorithm in Section 7.2.5. 

7. 2. 1 Effect of the Dataset Cardinalities 

Figure 12 shows the experimental results for varying the 
total number of objects in the dataset. Both of the results 
of Gaussian distribution and uniform distribution shows our 
ExactMaxRS is much more efficient than the algorithms 
based on plane-sweep. Especially, even if the dataset gets 
larger, the ExactMaxRS algorithm achieves performance sim- 
ilar to that on the smaller dataset, which effectively shows 
that our algorithm is scalable to datasets of a massive size. 
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(a) Gaussian distribution (b) Uniform distribution 
Figure 12: Effect of the dataset cardinalities 



7.2.2 Effect of the Buffer Size 

Figure 13 shows the experimental results for varying the 
buffer size. Even though all the algorithms exhibit better 
performance as the buffer size increases, the ExactMaxRS 
algorithm is more sensitive to the size of buffer than the 
others. This is because our algorithm uses the buffer more 
effectively. As proved in Theorem 2, the I/O complexity 
of ExactMaxRS is 0((N/B)\og M/B (N/B)), which means 
the larger M , the smaller the factor log M / s (N/B). Nev- 
ertheless, once the buffer size is larger than a certain size, 
the ExactMaxRS algorithm also shows behavior similar to 
the others, since the entire I/O cost will be dominated by 
0(N/B), i.e., linear pcost. 
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(a) Gaussian distribution (b) Uniform distribution 
Figure 13: Effect of the buffer size 

7.2.3 Effect of the Range Size 

Figure 14 shows the experimental results for varying the 
range parameters. Without loss of generality, we use the 
same value for each dimension, i.e., each rectangle is a square. 
It is observed that the ExactMaxRS algorithm is less influ- 
enced by the size of range than the other algorithms. This 
is because as the size of range increases, the probability that 
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rectangles overlap also increases in the algorithms based on 
plane-sweep, which means that the number of interval inser- 
tions will also increase. Meanwhile, the ExactMaxRS algo- 
rithm is not much affected by the overlapping probability. 
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(a) Gaussian distribution (b) Uniform distribution 
Figure 14: Effect of the range size 

7.2.4 Results of Real Datasets 

We conduct the same kind of experiments on real datasets 
except varying cardinalities. As shown in Table 2, dataset 
UX is not only much smaller, but also sparser than NE, since 
the domains of the data space are the same, i.e., 1M x 1M. 
In fact, we can regard UX as a macro view of NE. 

Overall trends of the graphs are similar to the results in 
synthetic datasets, as shown in Figures 15 and 16. Note 
that in Figure 15(a), when the buffer size gets larger than 
512KB, the naive plane sweep algorithm shows the best per- 
formance. This is because UX is small enough to be loaded 
into a buffer of size 512KB, which causes only one linear 
scan. However, we can see that the aSB-Tree cannot be 
loaded into a buffer of the same size, since the aSB-Tree re- 
quires more space due to the other information in the tree 
structure such as pointers of child nodes. 

In this paper, since we focus on massive datasets that 
should be stored in external memory, we can claim that 
our ExactMaxRS algorithm is much more efficient than the 
others for large datasets such as NE. 
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Figure 15: Effect of the buffer size on real datasets 

7.2.5 The Quality of Approximation 

Finally, we evaluate the quality of approximation obtained 
from the ApproxMaxCRS algorithm in Figure 17. Since 
the quality can be different when the diameter d changes, 
we examine the quality by varying d on both synthetic and 
real datasets. Optimal answers are obtained by implement- 
ing a theoretical algorithm [8] that has time complexity 
0(n 2 log n) (and therefore, is not practical). We observe 
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Figure 16: Effect of the range size on real datasets 



that when the diameter gets larger, the quality of approx- 
imation becomes higher and more stable, since more ob- 
jects are included in the given range. Even though theoret- 
ically our ApproxMaxCRS algorithm guarantees the (1/4)- 
approximation bound, the average approximation ratio is 
much larger than 1/4 in practice, which is close to 0.9. 
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Figure 17: Approximation quality 



8. CONCLUSIONS AND FUTURE WORK 

In this paper, we solve the MaxRS problem in spatial 
databases. This problem is useful in many scenarios such 
as finding the most profitable service place and finding the 
most serviceable place, where a certain size of range should 
be associated with the place. For the MaxRS problem, we 
propose the first external-memory algorithm, ExactMaxRS, 
with a proof that the ExactMaxRS algorithm correctly solves 
the MaxRS problem in optimal I/O's. Furthermore, we 
propose an approximation algorithm, ApproxMaxCRS, for 
the MaxCRS problem that is a circle version of the MaxRS 
problem. We also prove that the ApproxMaxCRS algorithm 
gives a (l/4)-approximate solution to the exact solution for 
the MaxCRS problem. Through extensive experiments on 
both synthetic and real datasets, we demonstrate that the 
proposed algorithms are also efficient in practice. 

Now we are considering several directions for our future 
works. First, it will be naturally feasible to extend our al- 
gorithm to deal with MaxfcRS problem or MvnRS problem. 
Second, focusing on the MaxCRS problem, we are planning 
to improve the algorithm to give a tighter bound. Finally, 
although our ExactMaxRS algorithm is proved optimal in 
terms of I/O cost, so far we do not use any preprocessed 
structure. Therefore, our next direction can be to reduce 
the searching cost by using a newly invented index. 
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